This tutorial aims at showing how to use the gProfiler2 R package. It is more or less a Rmd version of the vignette. You can refer to the vignette and reference manual (links below) for more details.
R package : https://cran.r-project.org/web/packages/gprofiler2
Reference manual: https://cran.r-project.org/web/packages/gprofiler2/gprofiler2.pdf
Vignette: https://cran.r-project.org/web/packages/gprofiler2/vignettes/gprofiler2.html
Here is an example in Homo sapiens with various types of gene identifiers, a SNP id, chromosomal intervals and a term ID from the Gene Ontology. Parameters are described in the vignette. The result is a named list where “result” is a data.frame with the enrichment analysis results and “meta” contains a named list with all the metadata for the query.
# Example of gene list functional enrichment analysis with gost
gostres <- gost(query = c("X:1000:1000000", "rs17396340", "GO:0005005", "ENSG00000156103", "NLRP1"),
organism = "hsapiens", ordered_query = FALSE,
multi_query = FALSE, significant = TRUE, exclude_iea = FALSE,
measure_underrepresentation = FALSE, evcodes = FALSE,
user_threshold = 0.05, correction_method = "g_SCS",
domain_scope = "annotated", custom_bg = NULL,
numeric_ns = "", sources = NULL, as_short_link = FALSE)
names(gostres)[1] "result" "meta"
summary(gostres) Length Class Mode
result 14 data.frame list
meta 5 -none- list
head(gostres$result) query significant p_value term_size query_size intersection_size precision recall term_id source term_name effective_domain_size source_order parents
1 query_1 TRUE 2.813706e-30 88 22 16 0.7272727 0.18181818 GO:0048013 GO:BP ephrin receptor signaling pathway 18092 14554 GO:0007169
2 query_1 TRUE 1.077092e-21 285 22 16 0.7272727 0.05614035 GO:0007411 GO:BP axon guidance 18092 3305 GO:0007409, GO:0097485
3 query_1 TRUE 1.140559e-21 286 22 16 0.7272727 0.05594406 GO:0097485 GO:BP neuron projection guidance 18092 21970 GO:0006928, GO:0006935, GO:0048812
4 query_1 TRUE 6.823529e-18 489 22 16 0.7272727 0.03271984 GO:0007409 GO:BP axonogenesis 18092 3304 GO:0048667, GO:0048812, GO:0061564
5 query_1 TRUE 2.809503e-17 534 22 16 0.7272727 0.02996255 GO:0061564 GO:BP axon development 18092 18331 GO:0031175
6 query_1 TRUE 2.846480e-16 617 22 16 0.7272727 0.02593193 GO:0048667 GO:BP cell morphogenesis involved in neuron differentiation 18092 15031 GO:0000904, GO:0048666
names(gostres$meta)[1] "query_metadata" "result_metadata" "genes_metadata" "timestamp" "version"
With the parameter evcodes = TRUE, the result will include the evidence codes. In addition, a column “intersection” will appear showing the input gene IDs that intersect with the corresponding functional term. Note that this parameter can decrease the performance and make the query slower.
# Example with intersection (can be time consuming)
gostres2 <- gost(query = c("X:1000:1000000", "rs17396340", "GO:0005005", "ENSG00000156103", "NLRP1"),
organism = "hsapiens", ordered_query = FALSE,
multi_query = FALSE, significant = TRUE, exclude_iea = FALSE,
measure_underrepresentation = FALSE, evcodes = TRUE,
user_threshold = 0.05, correction_method = "g_SCS",
domain_scope = "annotated", custom_bg = NULL,
numeric_ns = "", sources = NULL)
head(gostres2$result) query significant p_value term_size query_size intersection_size precision recall term_id source term_name effective_domain_size source_order parents
1 query_1 TRUE 2.813706e-30 88 22 16 0.7272727 0.18181818 GO:0048013 GO:BP ephrin receptor signaling pathway 18092 14554 GO:0007169
2 query_1 TRUE 1.077092e-21 285 22 16 0.7272727 0.05614035 GO:0007411 GO:BP axon guidance 18092 3305 GO:0007409, GO:0097485
3 query_1 TRUE 1.140559e-21 286 22 16 0.7272727 0.05594406 GO:0097485 GO:BP neuron projection guidance 18092 21970 GO:0006928, GO:0006935, GO:0048812
4 query_1 TRUE 6.823529e-18 489 22 16 0.7272727 0.03271984 GO:0007409 GO:BP axonogenesis 18092 3304 GO:0048667, GO:0048812, GO:0061564
5 query_1 TRUE 2.809503e-17 534 22 16 0.7272727 0.02996255 GO:0061564 GO:BP axon development 18092 18331 GO:0031175
6 query_1 TRUE 2.846480e-16 617 22 16 0.7272727 0.02593193 GO:0048667 GO:BP cell morphogenesis involved in neuron differentiation 18092 15031 GO:0000904, GO:0048666
evidence_codes
1 IDA TAS IEA,ISS TAS IEA,TAS IEA,IDA IBA TAS,IGI TAS IEA,ISS TAS IEA,IDA TAS IEA,IDA TAS IEA,IGI ISS IBA TAS IEA,ISS TAS IEA,TAS,IDA TAS IEA,IDA TAS,TAS IEA,IDA TAS IEA,IBA TAS IEA
2 IBA,ISS IBA IEA,IBA,IBA IEA,ISS IBA IEA,ISS IBA IEA,IBA IEA,IBA,IBA,ISS IBA IEA,IBA,ISS IBA IEA,ISS IBA IEA,IBA,IBA,IBA
3 IBA,ISS IBA IEA,IBA,IBA IEA,ISS IBA IEA,ISS IBA IEA,IBA IEA,IBA,IBA,ISS IBA IEA,IBA,ISS IBA IEA,ISS IBA IEA,IBA,IBA,IBA
4 IBA,ISS IBA IEA,IBA,ISS IBA IEA,ISS IBA IEA,ISS IBA IEA,IBA IEA,IBA,IBA,ISS IBA IEA,IBA,ISS IBA IEA,ISS IBA IEA,IBA,IBA,IBA
5 ISS IBA,ISS IBA IEA,IBA,ISS IBA IEA,ISS IBA IEA,ISS IBA IEA,IBA IEA,IBA,IBA,ISS IBA IEA,IBA,ISS IBA IEA,ISS IBA IEA,IBA,IBA,IBA
6 IBA,ISS IBA IEA,IBA,ISS IBA IEA,ISS IBA IEA,ISS IBA IEA,IBA IEA,IBA,IBA,ISS IBA IEA,IBA,ISS IBA IEA,ISS IBA IEA,IBA,IBA,IBA
intersection
1 ENSG00000044524,ENSG00000070886,ENSG00000080224,ENSG00000108947,ENSG00000116106,ENSG00000133216,ENSG00000135333,ENSG00000142627,ENSG00000143590,ENSG00000145242,ENSG00000146904,ENSG00000154928,ENSG00000182580,ENSG00000183317,ENSG00000196411,ENSG00000243364
2 ENSG00000044524,ENSG00000070886,ENSG00000080224,ENSG00000108947,ENSG00000116106,ENSG00000133216,ENSG00000135333,ENSG00000142627,ENSG00000143590,ENSG00000145242,ENSG00000146904,ENSG00000154928,ENSG00000182580,ENSG00000183317,ENSG00000196411,ENSG00000243364
3 ENSG00000044524,ENSG00000070886,ENSG00000080224,ENSG00000108947,ENSG00000116106,ENSG00000133216,ENSG00000135333,ENSG00000142627,ENSG00000143590,ENSG00000145242,ENSG00000146904,ENSG00000154928,ENSG00000182580,ENSG00000183317,ENSG00000196411,ENSG00000243364
4 ENSG00000044524,ENSG00000070886,ENSG00000080224,ENSG00000108947,ENSG00000116106,ENSG00000133216,ENSG00000135333,ENSG00000142627,ENSG00000143590,ENSG00000145242,ENSG00000146904,ENSG00000154928,ENSG00000182580,ENSG00000183317,ENSG00000196411,ENSG00000243364
5 ENSG00000044524,ENSG00000070886,ENSG00000080224,ENSG00000108947,ENSG00000116106,ENSG00000133216,ENSG00000135333,ENSG00000142627,ENSG00000143590,ENSG00000145242,ENSG00000146904,ENSG00000154928,ENSG00000182580,ENSG00000183317,ENSG00000196411,ENSG00000243364
6 ENSG00000044524,ENSG00000070886,ENSG00000080224,ENSG00000108947,ENSG00000116106,ENSG00000133216,ENSG00000135333,ENSG00000142627,ENSG00000143590,ENSG00000145242,ENSG00000146904,ENSG00000154928,ENSG00000182580,ENSG00000183317,ENSG00000196411,ENSG00000243364
The query results can also be gathered into a short-link to the g:Profiler web tool.
# Get a web link
gostres_link <- gost(query = c("X:1000:1000000", "rs17396340", "GO:0005005", "ENSG00000156103", "NLRP1"),
as_short_link = TRUE)
gostres_link[1] "https://biit.cs.ut.ee/gplink/l/3Sjz6iFLT6"
The function gost also allows to perform enrichment on multiple input gene lists. If the parameter multiquery = TRUE is used, then the results from all of the input queries are grouped according to term IDs for better comparison.
# Multiple gene lists
multi_gostres1 <- gost(query = list("chromX" = c("X:1000:1000000", "rs17396340",
"GO:0005005", "ENSG00000156103", "NLRP1"),
"chromY" = c("Y:1:10000000", "rs17396340",
"GO:0005005", "ENSG00000156103", "NLRP1")),
multi_query = FALSE)
head(multi_gostres1$result) query significant p_value term_size query_size intersection_size precision recall term_id source term_name effective_domain_size source_order parents
1 chromX TRUE 2.813706e-30 88 22 16 0.7272727 0.18181818 GO:0048013 GO:BP ephrin receptor signaling pathway 18092 14554 GO:0007169
2 chromX TRUE 1.077092e-21 285 22 16 0.7272727 0.05614035 GO:0007411 GO:BP axon guidance 18092 3305 GO:0007409, GO:0097485
3 chromX TRUE 1.140559e-21 286 22 16 0.7272727 0.05594406 GO:0097485 GO:BP neuron projection guidance 18092 21970 GO:0006928, GO:0006935, GO:0048812
4 chromX TRUE 6.823529e-18 489 22 16 0.7272727 0.03271984 GO:0007409 GO:BP axonogenesis 18092 3304 GO:0048667, GO:0048812, GO:0061564
5 chromX TRUE 2.809503e-17 534 22 16 0.7272727 0.02996255 GO:0061564 GO:BP axon development 18092 18331 GO:0031175
6 chromX TRUE 2.846480e-16 617 22 16 0.7272727 0.02593193 GO:0048667 GO:BP cell morphogenesis involved in neuron differentiation 18092 15031 GO:0000904, GO:0048666
multi_gostres2 <- gost(query = list("chromX" = c("X:1000:1000000", "rs17396340",
"GO:0005005", "ENSG00000156103", "NLRP1"),
"chromY" = c("Y:1:10000000", "rs17396340",
"GO:0005005", "ENSG00000156103", "NLRP1")),
multi_query = TRUE)
head(multi_gostres2$result) term_id p_values significant term_size query_sizes intersection_sizes source term_name effective_domain_size source_order parents
1 GO:0005005 7.162616e-48, 4.090780e-44 TRUE, TRUE 16 23, 33 16, 16 GO:MF transmembrane-ephrin receptor activity 18694 1548 GO:0005003
2 GO:0005003 6.933232e-45, 3.953788e-41 TRUE, TRUE 19 23, 33 16, 16 GO:MF ephrin receptor activity 18694 1546 GO:0004714
3 GO:0004714 2.581144e-33, 1.439607e-29 TRUE, TRUE 63 23, 33 16, 16 GO:MF transmembrane receptor protein tyrosine kinase activity 18694 1295 GO:0004713, GO:0019199
4 REAC:R-HSA-3928665 4.164481e-33, 1.836785e-32 TRUE, TRUE 49 20, 21 16, 16 REAC EPH-ephrin mediated repulsion of cells 10531 747 REAC:R-HSA-2682334
5 GO:0019199 2.920627e-31, 1.613380e-27 TRUE, TRUE 82 23, 33 16, 16 GO:MF transmembrane receptor protein kinase activity 18694 4539 GO:0004672, GO:0004888
6 GO:0048013 2.813706e-30, 8.827603e-26 TRUE, TRUE 88 22, 34 16, 16 GO:BP ephrin receptor signaling pathway 18092 14554 GO:0007169
The enrichment results can be visualized with a Manhattan-like-plot.
# Visualization
gostplot(gostres, capped = TRUE, interactive = TRUE)If interactive = FALSE, then the function returns a static ggplot object.
The function publish_gostplot takes the static plot object as an input and enables to highlight a selection of interesting terms from the results with numbers and table of results.
# Static ggplot object
p <- gostplot(gostres, capped = FALSE, interactive = FALSE)
# Highlight terms
pp <- publish_gostplot(p, highlight_terms = c("GO:0048013", "REAC:R-HSA-3928663"),
width = NA, height = NA, filename = NULL )The gost results can also be visualized as a table.
# Table
publish_gosttable(gostres, highlight_terms = gostres$result[c(1:2,10,100:102,120,124,125),],
use_colors = TRUE,
show_columns = c("source", "term_name", "term_size", "intersection_size"),
filename = NULL)The same functions work also in case of multiquery results showing multiple Manhattan plots on top of each other and a multiple result table.
# Multiquery plot
gostplot(multi_gostres2, capped = TRUE, interactive = TRUE)# Multiquery table
publish_gosttable(multi_gostres1,
highlight_terms = multi_gostres1$result[c(1, 24, 82, 176, 204, 234),],
use_colors = TRUE,
show_columns = c("source", "term_name", "term_size"),
filename = NULL)In addition to the available GO, KEGG, etc data sources, users can upload their own custom data source using the Gene Matrix Transposed file format (GMT). The users can compose the files themselves or use pre-compiled gene sets from available dedicated websites like Molecular Signatures Database (MSigDB), etc.
GProfiler2 also has a tool to convert gene IDs between databases, etc. It is called g:Convert.
# Gene identifier conversion with gconvert
gconvert(query = c("REAC:R-HSA-3928664", "rs17396340", "NLRP1"), organism = "hsapiens",
target="ENSG", mthreshold = Inf, filter_na = TRUE) input_number input target_number target name description namespace
1 1 REAC:R-HSA-3928664 1.1 ENSG00000010810 FYN FYN proto-oncogene, Src family tyrosine kinase [Source:HGNC Symbol;Acc:HGNC:4037] REAC
2 1 REAC:R-HSA-3928664 1.10 ENSG00000133216 EPHB2 EPH receptor B2 [Source:HGNC Symbol;Acc:HGNC:3393] REAC
3 1 REAC:R-HSA-3928664 1.11 ENSG00000136238 RAC1 Rac family small GTPase 1 [Source:HGNC Symbol;Acc:HGNC:9801] REAC
4 1 REAC:R-HSA-3928664 1.12 ENSG00000137575 SDCBP syndecan binding protein [Source:HGNC Symbol;Acc:HGNC:10662] REAC
5 1 REAC:R-HSA-3928664 1.13 ENSG00000149269 PAK1 p21 (RAC1) activated kinase 1 [Source:HGNC Symbol;Acc:HGNC:8590] REAC
6 1 REAC:R-HSA-3928664 1.14 ENSG00000154928 EPHB1 EPH receptor B1 [Source:HGNC Symbol;Acc:HGNC:3392] REAC
7 1 REAC:R-HSA-3928664 1.15 ENSG00000180370 PAK2 p21 (RAC1) activated kinase 2 [Source:HGNC Symbol;Acc:HGNC:8591] REAC
8 1 REAC:R-HSA-3928664 1.16 ENSG00000182580 EPHB3 EPH receptor B3 [Source:HGNC Symbol;Acc:HGNC:3394] REAC
9 1 REAC:R-HSA-3928664 1.17 ENSG00000196411 EPHB4 EPH receptor B4 [Source:HGNC Symbol;Acc:HGNC:3395] REAC
10 1 REAC:R-HSA-3928664 1.2 ENSG00000071051 NCK2 NCK adaptor protein 2 [Source:HGNC Symbol;Acc:HGNC:7665] REAC
11 1 REAC:R-HSA-3928664 1.3 ENSG00000077264 PAK3 p21 (RAC1) activated kinase 3 [Source:HGNC Symbol;Acc:HGNC:8592] REAC
12 1 REAC:R-HSA-3928664 1.4 ENSG00000090776 EFNB1 ephrin B1 [Source:HGNC Symbol;Acc:HGNC:3226] REAC
13 1 REAC:R-HSA-3928664 1.5 ENSG00000101608 MYL12A myosin light chain 12A [Source:HGNC Symbol;Acc:HGNC:16701] REAC
14 1 REAC:R-HSA-3928664 1.6 ENSG00000102606 ARHGEF7 Rho guanine nucleotide exchange factor 7 [Source:HGNC Symbol;Acc:HGNC:15607] REAC
15 1 REAC:R-HSA-3928664 1.7 ENSG00000108262 GIT1 GIT ArfGAP 1 [Source:HGNC Symbol;Acc:HGNC:4272] REAC
16 1 REAC:R-HSA-3928664 1.8 ENSG00000108947 EFNB3 ephrin B3 [Source:HGNC Symbol;Acc:HGNC:3228] REAC
17 1 REAC:R-HSA-3928664 1.9 ENSG00000125266 EFNB2 ephrin B2 [Source:HGNC Symbol;Acc:HGNC:3227] REAC
18 2 rs17396340 2.1 ENSG00000054523 KIF1B kinesin family member 1B [Source:HGNC Symbol;Acc:HGNC:16636]
19 3 NLRP1 3.1 ENSG00000091592 NLRP1 NLR family pyrin domain containing 1 [Source:HGNC Symbol;Acc:HGNC:14374] ENTREZGENE,HGNC,UNIPROT_GN,WIKIGENE
GProfiler2 also has a tool to map gene IDs between species. It is called g:Orth.
# Mapping homologous genes across related organisms with gorth
gorth(query = c("Klf4", "Sox2", "71950"), source_organism = "mmusculus",
target_organism = "hsapiens", mthreshold = Inf, filter_na = TRUE,
numeric_ns = "ENTREZGENE_ACC") input_number input input_ensg ensg_number ortholog_name ortholog_ensg description
1 1 Klf4 ENSMUSG00000003032 1.1.1 KLF4 ENSG00000136826 Kruppel like factor 4 [Source:HGNC Symbol;Acc:HGNC:6348]
2 2 Sox2 ENSMUSG00000074637 2.1.1 SOX2 ENSG00000181449 SRY-box transcription factor 2 [Source:HGNC Symbol;Acc:HGNC:11195]
3 3 71950 ENSMUSG00000012396 3.1.1 NANOG ENSG00000111704 Nanog homeobox [Source:HGNC Symbol;Acc:HGNC:20857]
4 3 71950 ENSMUSG00000012396 3.1.2 NANOGP8 ENSG00000255192 Nanog homeobox retrogene P8 [Source:HGNC Symbol;Acc:HGNC:23106]
It is possible to use older (archived) versions or the beta version of gProfiler.
# Accessing archived version (gprofiler2 package is only compatible with versions e94_eg41_p11 and higher)
set_base_url("http://biit.cs.ut.ee/gprofiler_archive3/e95_eg42_p13")
# Accessing the beta release
set_base_url("http://biit.cs.ut.ee/gprofiler_beta")
# Check the current version
get_base_url()[1] "http://biit.cs.ut.ee/gprofiler_beta"
For the sake of traceability, store the specifications of your R environment in the report, with the command sessionInfo(). This will indicate the version of R as well as of all the libraries used in this notebook.
sessionInfo()R version 4.0.2 (2020-06-22)
Platform: x86_64-conda_cos6-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)
Matrix products: default
BLAS/LAPACK: /shared/ifbstor1/software/miniconda/envs/r-4.0.2/lib/libopenblasp-r0.3.10.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] gprofiler2_0.2.0 knitr_1.30
loaded via a namespace (and not attached):
[1] tidyselect_1.1.0 xfun_0.20 purrr_0.3.4 colorspace_2.0-0 vctrs_0.3.6 generics_0.1.0 htmltools_0.5.1.1 viridisLite_0.3.0 yaml_2.2.1 utf8_1.1.4 plotly_4.9.3 rlang_0.4.10 pillar_1.5.1 later_1.1.0.1 glue_1.4.2 DBI_1.1.1
[17] lifecycle_1.0.0 stringr_1.4.0 munsell_0.5.0 gtable_0.3.0 htmlwidgets_1.5.3 evaluate_0.14 labeling_0.4.2 fastmap_1.1.0 httpuv_1.5.5 crosstalk_1.1.1 fansi_0.4.2 Rcpp_1.0.6 xtable_1.8-4 scales_1.1.1 promises_1.2.0.1 debugme_1.1.0
[33] jsonlite_1.7.2 farver_2.1.0 mime_0.10 gridExtra_2.3 ggplot2_3.3.3 digest_0.6.27 stringi_1.5.3 dplyr_1.0.5 shiny_1.5.0 grid_4.0.2 tools_4.0.2 bitops_1.0-6 magrittr_2.0.1 lazyeval_0.2.2 RCurl_1.98-1.2 tibble_3.1.0
[49] crayon_1.4.1 tidyr_1.1.3 pkgconfig_2.0.3 ellipsis_0.3.1 data.table_1.14.0 assertthat_0.2.1 rmarkdown_2.5 httr_1.4.2 R6_2.5.0 compiler_4.0.2